Amide Spectral Fingerprints are Hydrogen Bonding-Mediated

The origin of the peculiar amide spectral features of proteins in aqueous solution is investigated, by exploiting a combined theoretical and experimental approach to study UV Resonance Raman (RR) spectra of peptide molecular models, namely N-acetylglycine-N-methylamide (NAGMA) and N-acetylalanine-N-methylamide (NALMA). UVRR spectra are recorded by tuning Synchrotron Radiation at several excitation wavelengths and modeled by using a recently developed multiscale protocol based on a polarizable QM/MM approach. Thanks to the unparalleled agreement between theory and experiment, we demonstrate that specific hydrogen bond interactions, which dominate hydration dynamics around these solutes, play a crucial role in the selective enhancement of amide signals. These results further argue the capability of vibrational spectroscopy methods as valuable tools for refined structural analysis of peptides and proteins in aqueous solution.


.1 Samples
The dipeptides N-acetyl-leucine-methylamide (NALMA) and N-acetyl-glycine-methylamide (NAGMA) were purchased from Bachem, and used without further purification. They appear both as micro-crystalline powders. No relevant contamination by water was found in the crystalline peptide powders, as deducted by the absence in the Raman spectra of any signal attributable to the intense OH stretching band of water. The aqueous solutions of dipeptides were prepared by dissolving NAGMA or NALMA in doubly distilled deionized water in order to reach the desired concentrations, typically corresponding to a molar ratio of 1:336 peptide:H 2 O. All the solutions were freshly prepared before the measurements. Hydrated powders of NAGMA and NALMA were prepared by adding to the dry peptides a controlled amount of water up to reach the molar ratio of about 1:10 peptide:H 2 O. The exact amount of added hydration water was determined by weight of the samples.

Out of resonance Raman measurements
Visible Raman spectra were collected on the microcrystalline forms and on the aqueous solutions of dipeptides by means of Raman setup (Horiba-JobinYvon, LabRam Aramis) in backscattering geometry and using the exciting radiation at 632.8 nm provided by a He-Ne laser and at 532 nm provided by a solid state laser. The spectral resolution was set at about 1 cm −1 .

SR-UVRR measurements
Synchrotron-based UV Resonance Raman (SR-UVRR) measurements were collected at the BL10.2-IUVS beamline of Elettra-Sincrotrone Trieste (Italy) using the experimental setup described in detail in ref 1. The UVRR spectra were acquired at different excitation wavelengths in the deep UV range provided by the emission of synchrotron radiation (SR). The energy of excitation was set by regulating the undulator gap aperture and using a Czerny-Turner monochromator (Acton SP2750, focal length 750 mm, Princeton Instruments, Acton, MA, USA) equipped with a holographic grating with 3600 groves/mm for monochromatizing the incoming SR. Raman signal was collected in back-scattered geometry, by a single pass of a Czerny-Turner spectrometer of 750 mm focal length and equipped with holographic gratings at 1800g/mm and 3600 g/mm. The resolution was set at different values, depending on the excitation wavelength (e.g. 2.8 cm-1/pixel at 210 nm and 1.8 cm-1/pixel at 266 nm). The calibration of the spectrometer was standardized using cyclohexane (spectroscopic grade, Sigma Aldrich). The final radiation power on the samples was kept between a few up to tens of µW. Any possible photo-damage effect due to prolonged exposure of the sample to UV radiation was avoided by continuously spinning the sample cell during the measurements.

Conformational studies
Potential NAGMA and NALMA conformers were initially searched by 2-dimensional potential energy surfaces (PES), built by scanning its Ψ → N- Figure S1) from 0 • to 360 • in steps of 10 • at the B3LYP/6-31+G(d) level of theory in vacuo and in combination with the Polarizable Continuum Model (PCM), using the Gaussian 16 program. (2) The conformational preferences of these two blocked amino acids have been extensively studied in the gas phase, and in solution, most notably the C7 and C5 conformers, arising from the intermolecular hydrogen bonds (HBs) that lead to the formation of a seven or fivemembered rings. (3,4,5,6,7,8,9) The reported in literature β 2 , C5 and C7 were identified as the lowest energy minima. We also consider dimeric forms for NAGMA and NALMA. Optimizations and frequency calculations were carried out at the B3LYP/6-311++G(d, p) level for all conformers (and dimers) and for the structure designed by saturating the potential hydrogen bond (HB) sites in the dipeptides. In what follows, that motif will be called "Supermolecule + 4W ". Orbital interactions associated to hydrogen bonding were analyzed in the Natural Bond Orbitals (NBO) framework (10,11) and the interaction energies were obtained via second order perturbation corrections to the Fock matrix with the NBO7 program. (12) Figure S1: Dipeptide model dihedral angle representations

MD simulations
Starting from the β 2 conformer, 30 ns long Molecular Dynamics simulations of dipeptides in solution were performed with the GROMACS code (13). Geometrical and Lennard-Jones parameters for NAGMA and NALMA were obtained from the General Amber Force Field (14) and partial atomic charges were derived with the Charge Model 5 (15). Virtual sites were added to the carbonyl groups in order to improve the directionality of the hydrogen bonds to be formed with the solvent molecules. The squared simulation boxes were filled with TIP3P water molecules.(16) After energy minimization and NVT and NPT equilibrations, the MD simulation was performed at 300 K and 1 atm with periodic boundary conditions, particlemesh Ewald summation (17) for long-range electrostatics, 12Å cutoff for nonbonded inter-S4 actions, constrained X-H bonds, and 2 fs time step. A modified Berendsen thermostat (18) and a Parinello-Rahman barostat (19), with a coupling constant, τ , of 0.1 ps for each, were employed to maintain temperature and pressure. Atomic coordinates were saved every 10 ps. We skipped the first 10 ns of the trajectory, and extracted about 200 snapshots selecting the closest water molecules within a radius of 8Å of the peptide to obtain solute concentrations of ≈20 mg/mL (corresponding to a molar ratio of 1:336 peptide:H 2 O)) similar to those in the experiments. Hydration patterns were analyzed with the TRAVIS package (20,21). The clustering method reported in Ref.22 was used to identify similar conformations sampled during the MD run. Cutoffs were set to be the averages of the RMSD.

Calculations of the Spectra
We performed optimizations on geometries coming from the MD trajectories, and then carried out TD-DFT, frequencies and Raman calculations. In those calculations, solvent effects were described by means of the Quantum Mechanics/Fluctuating Charges (QM/FQ) model (23), applied to each of the extracted snapshots. Different FQ parameterizations (24,25) were exploited.
For the Resonance Raman spectra, we used the Franck Condon Vertical Gradient (FC|VG) approach in which the vibrational frequencies and normal modes of the excited state are assumed to be the same as the ground state, and the transition dipole moments are considered to be independent of the molecular geometry. Also, only the energy gradients are computed for the excited state. (26) We calculated the rotational invariants of the polarizability tensor using the most common set that consists of the mean polarizability a, the antisymmetric anisotropy δ and the anisotropy g. For symmetric tensors, δ = 0. a and g have the following general definitions when using the summation convention: where α ab denotes the components of the complex Raman polarizability tensor. The above invariants are related to the scattered intensity by means of equations defining the cross section for given experimental setups. For the commonly employed 90 • scattering geometry, the Raman differential cross section, σ i , is written as follows (27,28) where ω is the frequency of the incident light. In such a geometry, the incident linearly S5 polarized radiation is perpendicular to the scattering plane and a detection of all scattered polarizations is located at an angle of 90 • with respect to the incident radiation direction.
Reported averaged absorption, Raman and RR spectra spectra were obtained by convoluting peak intensities with Gaussian or Lorentzian functions. For the Gaussian functions we used a full width at half maximum (FWHM) of 0.5 eV, whereas for the Lorentzian ones we chose an FWHM of 20 cm −1 . RR Excitation profiles were also computed by scanning different excitations wavelengths based on the information provided by the UV-Vis absorption spectra. All QM/FQ spectra calculations were conducted using a locally modified version of the Gaussian 16 package at the B3LYP/6-311++G(d, p) level. In the case of QM/FQ RR spectra, some extra computations were carried out with CAM-B3LYP, PBE0, and M06-2X functionals, and it turned out that B3LYP fits better for the particular systems studied in this work.
Electronic absorption, Raman and Resonance Raman spectra were also calculated on the conformers found through the 2D scans of the Ramachandran angles of the peptides, using geometries both in the gas phase and in PCM.        Figure S2 displays the bidimensional scan for the Φ and Ψ torsion angles of the dipeptides backbone and the dihedral distribution functions and time evolution during the MD S11 sampling. From these plots it is evident that there are many possibilities for the angle combinations that have low energies (regions in blue in the 2D scans in Figure S2) thus stabilizing the dipeptides, while other Φ/Ψ couples are not possible for steric hindrance. In fact, the leucine residue causes the conformational space for NALMA to be more restricted than for NAGMA, with higher penalties of up to 22 kcal/mol in the forbidden (inaccessible) regions. Correspondingly, our MD runs suggest that for NAGMA the Ψ sampled values range from -60 to 60 degrees and the most populated Φ angles are around ±180 degrees, while for NALMA these torsion angles are limited to the (-60 • -30 • ) interval for Ψ, and the most occurring Φ value is around -120 • .

Hydration patterns 3.2.1 RDFs
Since most conformers show a C=O· · · H intramolecular distance of about 3.9Å (see top panel in Figure S3), the competitive effect seems to be dominated by the interaction with the solvent molecules, from which C=O· · · H w and N-H· · · O w HBs arise. As a result, in the first solvation shell, 2 and 1 water molecules for each C=O· · · H w , N-H· · · O w contact are found to be coordinated to the dipeptides. In contrast, the nitrogen atom of the dipeptides does not interact with the hydrogen atoms of the surrounding water molecules. S12 NAGMA NALMA  Figure S2: a) 2D scans of the torsion angles shown in Figure S1, calculated using the model chemistry B3LYP

NBO interactions
Stabilization energies are depicted in Figure S4 with the corresponding motifs. It is clear that E d−a values for intramolecular HBs are smaller when compared to those in the intermolecular cases, either with water molecules or with another peptide monomer. This could explain the fact that the β 2 conformer was found to be the lowest energy structure in solution, since it leaves the carbonyl groups available to interact with the solvent, maximizing the orbital interactions. In the supermolecule cases, notice that E  Figure S5: Computed UV-Vis absorption spectrum for NAGMA in different environments. Vertical lines mark the position of the absorption maxima for the simulated, λ max,sim at 175 nm, and for the experimental spectra, λ max,exp at 190 nm. Red arrows indicate the wavelengths at which RR spectra were measured and ∆ exp is the difference between λ max,exp and one the experimental wavelengths used in RR. This ∆ exp is to be taken into account in the RR calculations. a FQ parametrization from Ref. 24 Figure S6: Orbitals of NAGMA involved in the π → π * transitions. S17 Overall, there are large differences when comparing the spectra to those obtained using B3LYP. With this latter, in both FQ parametrizations, spectra resemble more the experimental data that we collected. Therefore, we chose B3LYP functional for the entire set of calculations presented in the paper.  Table 1).  Table 1). a FQ parametrization from Ref.

Natural Resonance Theory (NRT) Analysis
(a) (b) (c) (d) Figure S12: Main resonance contributors to the NAGMA structure obtained from the NRT analysis Table S4: Percentage values of NAGMA main resonance structures shown in Figure S12, obtained from the NRT analysis of the different conformers

Simulated RR Excitation Profiles
All spectra are scaled such that the maximum intensity is unity. A Lorentzian broadening with a FWHM of 20 cm −1 was used. Raman Shift (cm -1 ) Raman Shift (cm -1 )     Figure S25: UV-resonance Raman spectra of NAGMA (left) and NALMA (right) in vacuo, calculated when their corresponding absorption maxima wavelengths are used to irradiate the system (see Table 1). C5 is the lowest energy conformer of NAGMA and NALMA in the gas phase. RR intensities (in cm 2 mol −1 sr −1 ) were calculated with a damping factor of 200 cm −1 and broadened using Lorentzian functions with FWHM = 20 cm −1 . Experimental spectra were collected using 228 nm as excitation wavelength on micro-crystalline phases of the two dipeptides.